Variance-penalized Markov decision processes: dynamic programming and reinforcement learning techniques
نویسنده
چکیده
In control systems theory, the Markov decision process (MDP) is a widely used optimization model involving selection of the optimal action in each state visited by a discrete-event system driven by Markov chains. The classical MDP model is suitable for an agent/decision-maker interested in maximizing expected revenues, but does not account for minimizing variability in the revenues. An MDP model in which the agent can maximize the revenues while simultaneously controlling the variance in the revenues is proposed. This work is rooted in machine learning/neural network concepts, where updating is based on system feedback and step sizes. First a Bellman equation for the problem is proposed. Thereafter, convergent dynamic programming and reinforcement learning techniques for solving the MDP are provided along with encouraging numerical results on a small MDP and a preventive maintenance problem.
منابع مشابه
Target-sensitive control of Markov and semi-Markov processes
We develop the theory for Markov and semi-Markov control using dynamic programming and reinforcement learning in which a form of semi-variance which computes the variability of rewards below a pre-specified target is penalized. The objective is to optimize a function of the rewards and risk where risk is penalized. Penalizing variance, which is popular in the literature, has some drawbacks that...
متن کامل2D1431 Machine Learning Lab 3: Reinforcement Learning
In this lab you will learn about dynamic programming and reinforcement learning. It is assumed that you are familiar with the basic concepts of reinforcement learning and that you have read chapter 13 in the course bookMachine Learning (Mitchell, 1997). The first four chapters of the survey on reinforcement learning by Kaelbling et al. (1996) is a good supplementary material. For further readin...
متن کامل2D1431 Machine Learning Lab 4: Reinforcement Learning
In this lab you will learn about dynamic programming and reinforcement learning. It is assumed that you are familiar with the basic concepts of reinforcement learning and that you have read chapter 13 in the course book Machine Learning (Mitchell, 1997). The first four chapters of the survey on reinforcement learning by Kaelbling et al. (1996) is a good supplementary material. For further readi...
متن کاملSolving Hidden-Mode Markov Decision Problems
Hidden-Mode Markov decision processes (HM-MDPs) are a novel mathematical framework for a subclass of nonstationary reinforcement learning problems where environment dynamics change over time according to a Markov process. HM-MDPs are a special case of partially observable Markov decision processes (POMDPs), and therefore nonstationary problems of this type can in principle be addressed indirect...
متن کاملA New Learning Algorithm for Optimal Stopping
A linear programming formulation of the optimal stopping problem for Markov decision processes is approximated using linear function approximation. Using this formulation, a reinforcement learning scheme based on a primal-dual method and incorporating a sampling device called ‘split sampling’ is proposed and analyzed. An illustrative example from option pricing is also included.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. General Systems
دوره 43 شماره
صفحات -
تاریخ انتشار 2014